179 research outputs found

    Near-Neighbor Preserving Dimension Reduction for Doubling Subsets of l_1

    Get PDF
    Randomized dimensionality reduction has been recognized as one of the fundamental techniques in handling high-dimensional data. Starting with the celebrated Johnson-Lindenstrauss Lemma, such reductions have been studied in depth for the Euclidean (l_2) metric, but much less for the Manhattan (l_1) metric. Our primary motivation is the approximate nearest neighbor problem in l_1. We exploit its reduction to the decision-with-witness version, called approximate near neighbor, which incurs a roughly logarithmic overhead. In 2007, Indyk and Naor, in the context of approximate nearest neighbors, introduced the notion of nearest neighbor-preserving embeddings. These are randomized embeddings between two metric spaces with guaranteed bounded distortion only for the distances between a query point and a point set. Such embeddings are known to exist for both l_2 and l_1 metrics, as well as for doubling subsets of l_2. The case that remained open were doubling subsets of l_1. In this paper, we propose a dimension reduction by means of a near neighbor-preserving embedding for doubling subsets of l_1. Our approach is to represent the pointset with a carefully chosen covering set, then randomly project the latter. We study two types of covering sets: c-approximate r-nets and randomly shifted grids, and we discuss the tradeoff between them in terms of preprocessing time and target dimension. We employ Cauchy variables: certain concentration bounds derived should be of independent interest

    Products of Euclidean Metrics and Applications to Proximity Questions among Curves

    Get PDF
    International audienceThe problem of Approximate Nearest Neighbor (ANN) search is fundamental in computer science and has benefited from significant progress in the past couple of decades. However, most work has been devoted to pointsets whereas complex shapes have not been sufficiently treated. Here, we focus on distance functions between discretized curves in Euclidean space: they appear in a wide range of applications, from road segments and molecular backbones to time-series in general dimension. For p-products of Euclidean metrics, for any p ≥ 1, we design simple and efficient data structures for ANN, based on randomized projections, which are of independent interest. They serve to solve proximity problems under a notion of distance between discretized curves, which generalizes both discrete Fréchet and Dynamic Time Warping distances. These are the most popular and practical approaches to comparing such curves. We offer the first data structures and query algorithms for ANN with arbitrarily good approximation factor, at the expense of increasing space usage and preprocessing time over existing methods. Query time complexity is comparable or significantly improved by our algorithms; our approach is especially efficient when the length of the curves is bounded. 2012 ACM Subject Classification Theory of computation → Data structures design and analysi

    Enumerating a subset of the integer points inside a Minkowski sum

    Get PDF
    AbstractSparse elimination exploits the structure of algebraic equations in order to obtain tighter bounds on the number of roots and better complexity in numerically approximating them. The model of sparsity is of combinatorial nature, thus leading to certain problems in general-dimensional convex geometry. This work addresses one such problem, namely the computation of a certain subset of integer points in the interior of integer convex polytopes. These polytopes are Minkowski sums, but avoiding their explicit construction is precisely one of the main features of the algorithm. Complexity bounds for our algorithm are derived under certain hypotheses, in terms of output-size and the sparsity parameters. A public domain implementation is described and its performance studied. Linear optimization lies at the inner loop of the algorithm, hence we analyze the structure of the linear programs and compare different implementations
    • …
    corecore